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Abstract 

Background: Frankia is a genus of soil actinobacteria forming nitrogen-fixing root-nodule symbiotic relationships 
with non-leguminous woody plant species, collectively called actinorhizals, from eight dicotyledonous families. 
Frankia strains are classified into four host-specificity groups (HSGs), each of which exhibits a distinct host range. 
Genome sizes of representative strains of AInus, Casuarina, and Elaeagnus HSGs are highly diverged and are 
positively correlated with the size of their host ranges. 

Results: The content and size of 12 Frankia genomes were investigated by in silico comparative genome 
hybridization and pulsed-field gel electrophoresis, respectively. Data were collected from four query strains of each 
HSG and compared with those of reference strains possessing completely sequenced genomes. The degree of 
difference in genome content between query and reference strains varied depending on HSG. Elaeagnus query 
strains were missing the greatest number (22-32%) of genes compared with the corresponding reference genome; 
Gasuarina query strains lacked the fewest (0-4%), with AInus query strains intermediate (14-18%). In spite of the 
remarkable gene loss, genome sizes of AInus and Elaeagnus query strains were larger than would be expected 
based on total length of the absent genes. In contrast, Casuarina query strains had smaller genomes than expected. 

Conclusions: The positive correlation between genome size and host range held true across all investigated strains, 
supporting the hypothesis that size and genome content differences are responsible for observed diversity in host 
plants and host plant biogeography among Frankia strains. In addition, our results suggest that different dynamics 
of shuffling of genome content have contributed to these symbiotic and biogeographic adaptations. Elaeagnus 
strains, and to a lesser extent AInus strains, have gained and lost many genes to adapt to a wide range of 
environments and host plants. Conversely, rather than acquiring new genes, Gasuarina strains have discarded genes 
to reduce genome size, suggesting an evolutionary orientation towards existence as specialist symbionts. 
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Background 

Frankia is a genus of soil actinobacteria with unique abi- 
lities to fix atmospheric dinitrogen (N2) and establish endo- 
symbiotic associations with actinorhizal plants comprising 
various non-leguminous trees from eight dicotyledonous 
families [1-3]. This symbiosis, in which Frankia reduces N2 
to ammonium and supplies the resulting product to host 
plants, takes place in root nodules. As a result of the symbi- 
osis, actinorhizal plants grow rapidly, even in nutrient-poor 
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soils, and improve soil fertility. Frankia strains are classified 
into four host-specificity groups (HSGs) that establish sym- 
biosis with distinct host plant families [4]. "AInus" strains 
infect plant species in Myricaceae and the genus AInus of 
Betulaceae. "Casuarina" strains infect plant species in 
the genera Casuarina and Allocasuarina of Casuarina- 
ceae. "Elaeagnus" strains exhibit a broader host range, 
infecting plant species in five families (including Elaeag- 
naceae) of the orders Fagales and Rosales. "Rosaceous" 
strains infect plant species in four families of orders 
Rosales and Cucurbitales, although no strains have yet 
been isolated in pure culture. In a phylogenetic tree 
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generated from 16S rDNA sequences, strains belonging to 
each HSG cluster together in distinct clades [4]. 

In 2007, complete genome sequences were determined 
for representative Frankia strains from Alnus, Casuarina, 
and Elaeagnus HSGs [4]. A surprising finding was that 
despite close phylogenetic relationship (>97.8% identity 
for 16S rDNA), genome sizes were very different among 
HSGs. The largest genome (Elaeagnus strain EANlpec) is 
9.0 Mbp and contains approximately 7,400 genes, whereas 
the smallest one (Casuarina strain CcI3) is only 5.4 Mbp 
and comprises about 4,600 genes. Alnus strain ACN14a 
possesses an intermediate-sized genome (7.5 Mbp) of ap- 
proximately 6,800 genes. This size divergence is the largest 
reported for any such closely related soil bacteria. Genome 
size of these strains correlates with the breadth of their 
host ranges. Comparative genome analysis has revealed 
that the difference in genome size is due to acquisition, 
loss, and duplication of genes occurring at different rates 
in different strains [4]. 

Two studies have uncovered evidence suggesting how 
such extensive diversification has occurred in Frankia 
genomes. Since they are particularly prevalent in Frankia 
genomes and indeed retain the ability to be excised from 
chromosomes, actinomycete integrative and conjugative 
elements (AICEs) may play a role in gene loss and ac- 
quisition [5]. Homologous recombination between in- 
sertion sequences (IS) could have also caused deletions 
of chromosomal segments, as genes contained in IS- 
rich regions of ACN14a and EANlpec genomes are ab- 
sent in the smallest genome, that of CcI3 [6]. 

In the present study, we analyzed content and size of 
12 Frankia genomes using in silico comparative genome 
hybridization (CGH) and pulsed-field gel electrophoresis 
(PFGE) to investigate within-HSG diversity of Frankia 
genomes. 

Results 

Genome sequencing of Frankia strains 

We analyzed genomes of four strains each of Alnus, 
Casuarina, and Elaeagnus HSGs (Table 1). Strains be- 
longing to the same HSG were phylogenetically very 
close, showing > 99% identity in 16S rDNA sequences 
(Figure 1) and > 95% identity in gyrB (DNA gyrase sub- 
unit B gene) and recA (recombinase A gene) sequences 
(data not shown). We obtained tens of millions of 50-bp 
reads from each query genome and conducted in silico 
CGH (Additional file 1) using a reference genome from 
the same HSG: ACN14a for Alnus, CcI3 for Casuarina, 
and EANlpec for Elaeagnus HSGs (Table 1). Figure 2 
contains histograms of coverage rates for all segments. 
Distributions were bimodal; most segments displayed ei- 
ther very low (0-10%) or very high (90-100%) coverage 
rates, with few intermediate values. This result indicates 
that in silico CGH (Additional file 1) can discriminate 



among genes shared between reference and query ge- 
nomes and those absent in a query genome. Hereafter, 
we refer to segments that showed coverage rates of< 
20% as low-coverage-rate (LCR) segments, consisting of 
LCR genes and LCR intergenic regions (IGRs). An LCR 
segment is likely absent from a query genome, either as 
a consequence of its deletion from the query genome or 
its insertion into a reference genome. 

Table 2 lists the number of LCR genes detected for 
each query genome. LCR genes were most prominent 
in Elaeagnus strains, accounting for 22-32% of all 
genes in the corresponding reference genome. The 
number of LCR genes varied among Elaeagnus strains; 
more were absent in strains EPOl and EUrOl than 
in Ema2 and EU05. Compared with Elaeagnus strains, 
Alnus strains featured fewer LCR genes, which ac- 
counted for 14-18% of genes in the reference genome. 
In Casuarina strains, LCR genes were much rarer; they 
were not detected for two strains (CaE04 and T7), and 
accounted for at most only 4% of total genes (Ceql). 
We plotted coverage rates of all segments in order of 
their appearance in the reference genome (Figure 3). 
LCR segments did not distribute randomly, but tended 
to be clustered in particular regions of the reference 
genomes. 

In genomes of ACN14a, CcI3, and EANlpec, respect- 
ively, 1,633, 185, and 2,685 genes were scored as LCR 
genes for at least one of the four query strains (nonre- 
dundant LCR genes; Figure 4). In Alnus and Elaeagnus 
HSGs, about 40% of nonredundant LCR genes were 
scored as LCR genes for all four query strains (Figure 4), 
indicating that they were commonly absent in genomes 
of these strains. The remaining genes were scored as 
LCR genes for one to three strains; the distribution of 
these genes was apparently unbiased, except that LCR 
genes specific to Asil and those shared with EPOl and 
EUrOl predominated. In the Casuarina HSG, 99% of 
nonredundant LCR genes were missing only in strain 
Ceql; only a few or no LCR genes were associated with 
the other strains. 

Confirmation by PGR 

We used PCR to confirm whether the identified LCR 
segments were structurally missing in the query ge- 
nomes. We designed primer sets that flanked individual 
or clustered LCR segments and performed PCR using 
Asil, Ceql, and Ema2 genomic DNA as templates. Most 
amplification products (69% for Asil, 100% for Ceql, 
and 75% for Ema2) were smaller than the size expected 
based on reference genome sequences (Additional file 2), 
indicating that those LCR segments were missing in the 
query genomes. In contrast, some bands were larger than 
the expected size, suggesting insertion of DNA segments 
at these loci (Additional file 2). 
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Table 1 Frankia strains used in this study 



HSG 


Strain 


Source plant 


Geographic origin 


Usage 


No. read 


AInus 


ACN14a [7] 


AInus viridis subsp. crispo 


Quebec, Canada 


Reference 


- 




AHf 


A. hirsuta 


Aomori, Japan 


Query 


49,569,598 




AHmOl' 


A. hirsuta ssp. microphylla 


Iwate, Japan 


Query 


35,732,054 




Asil [8] 


A. sieboldiono 


Okayama, Japan 


Query 


48,189,821 




Mrul [8] 


Myrica rubra 


Okayama, Japan 


Query 


34,687,733 


Casuarina 


Ccl3 [9] 


Casuarina cunninghamiana 


Petersham, U. S. A. 


Reference 


- 




CaEOS [10] 


C equisetifolia 


Okinawa, Japan 


Query 


46,493,718 




CaE04' 


C equisetifolia 


Senegal 


Query 


38,361,206 




Ceql [8] 


C equisetifolia 


Okayama, Japan 


Query 


46,629,137 




jr 


C cunninghanniana 


Ismailia, Egypt 


Query 


44,174,301 


Elaeagnus 


EANlpec [11] 


Elaeagnus angustifolia 


Ohio, U. S. A. 


Reference 






Ema2 [8] 


E. macrophylla 


Okayama, Japan 


Query 


37,558,396 




EPOl' 


E. pungens 


Kagoshima, Japan 


Query 


43,449,878 




EU05' 


E. umbellata 


Toyama, Japan 


Query 


26,180,153 




EUrOl' 


E. umbellata ssp. rotundifolia 


Tokyo, Japan 


Query 


42,309,826 



^Obtained in this study. 



LCR gene properties 

GC content at the third codon position (GC3) and 
codon adaptation index (CAI) of all genes are shown in 
Figure 5. Nonredundant LCR genes exhibited lower 
average GC3 and CAI values than other genes, suggest- 
ive of foreign origin, possibly through horizontal gene 
transfer. Dominant functions of nonredundant LCR 
genes are listed in Table 3. In all HSGs, the vast majority 
(40-65%) encoded hypothetical proteins with unknown 
functions. Three functional categories— transcriptional 
regulation, transport-associated, and transposase— were 
commonly associated with genes in the three HSGs. In 
the AInus HSG, functional categories related to nonribo- 
somal peptide and polyketide synthetases and acyl-CoA 
metabolism, involved in synthesis of bioactive secondary 
metabolites such as antibiotics and siderophores [12], 
were prevalent. In the Casuarina HSG, bacteriophage- 
related functions, such as restriction and modification 
system, CRISPR [13], integrase, and excisionase, were 
prominent. 

PFGE 

We estimated genome sizes of studied Frankia strains 
via PFGE of genomic DNA digested with Dral or Psil, 
Sizes obtained using either restriction enzyme were 
mostly consistent (Figure 6). Results from two reference 
strains (ACN14a and CcI3) revealed that the estimated 
sizes were slightly smaller than actual genome sizes 
(Figure 6) for two reasons: i) small bands less than 50 kb 
migrated out of the gel; and ii) the relative migration 
rate of Frankia DNA was faster than that of yeast marker 
DNA (Additional file 3). Expected sizes of query genomes. 



based on the assumption that they lacked all LCR seg- 
ments, are shown in Figure 6. Genome sizes of the four 
AInus query strains after the above underestimation was 
taken into account were larger than expected (Figure 6), 
but were similar to that of the reference genome (ACN14a). 
Estimated genome sizes of the four Elaeagnus query strains 
were apparently larger than expected. Two strains (Ema2 
and EU05) appeared to have genome sizes similar to the 
reference strain EANlpec when underestimation was taken 
into account. Notably, the estimated genome size of EPOl, 
in spite of the absence of more than 30% of genes, was 
much larger than that of EANlpec (Table 2). An opposite 
situation was observed in Casuarina query strains. Al- 
though few genes were missing in genomes of CaE03, 
CaE04, and T7 (Table 2), their estimated genome sizes were 
significantly smaller than the reference strain CcI3. Little 
similarity in banding patterns was observed among or even 
within HSGs (Additional file 3), suggesting divergence of 
genome structure. As reported for the reference strains [4], 
genome sizes of the query strains were correlated with ex- 
tent of their host ranges: Casuarina strains possessed the 
smallest genomes, Elaeagnus strains the largest, with AInus 
strains intermediate. 

Clustering of LCR genes 

To evaluate clustering of LCR genes, we searched refer- 
ence genomes for consecutive arrays of LCR segments, 
which were considered to be LCR gene clusters if they 
contained two or more LCR genes. Size and number of 
identified LCR gene clusters are shown in Figure 7. In 
every HSG, the vast majority of clusters were small, con- 
taining less than 10 genes. A substantial number of 
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Figure 1 Phylogenetic tree constructed from 16S rDNA 
sequences of Frankia strains used in this study. Reference strains 
are boxed. Nucleotide sequence identities (%) between query and 
reference strains of tine same liost-specificity group (HSG) are sliown. 
Identities between strains of distinct HSGs were 98-99% for AInus vs. 
Casuarina HSGs, 97-98% for Casuarina vs. Elaeagnus HSGs, and 97-98% 
for AInus vs. Elaeagnus HSGs. Fronkio from a Purshia tridentata nodule 
was used as an outgroup. 



independent LCR genes that did not form clusters were 
also detected (Figure 7). Fewer LCR gene clusters were 
found in Ceql since the strain is associated with only a 
small number of LCR genes (Table 2). Data for strains 
CaE03, CaE04, and T7 was not shown because they were 
associated with a few or no LCR genes (Table 2) and no 
LCR gene cluster was detected. 

LCR gene clusters would be a kind of genomic islands 
typically integrated into chromosomes by site-specific 
recombination at a tRNA gene through the action of 
integrase [5,14]. We searched both ends of each LCR 
gene cluster for direct repeats of tRNA sequences, but 
only two clusters were associated with such sequences 
(CcI3: Francci3_1194 to IGR Francci3_1203-Franc- 
ci3_R0023, tRNA-Gly; EANIpec: IGR Franeanl_R0059- 
Franeanl_7129 to Franeanl_7139, tRNA-Glu) (Table 4). 

AICEs are prevalent in Frankia genomes, and have been 
experimentally confirmed to retain their ability to be ex- 
cised from chromosomes [5]. Although three AICEs have 
been identified in ACN14a and CcI3 genomes, and four in 
the EANIpec genome [5], only one AICE— in ACN14a 
(Faln2929)— corresponded closely to any LCR gene clus- 
ters in our study (data not shown). 

DNA segments flanked by homologous sequences (direct 
repeat sequences) or flanked by ISs from the same family 
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Figure 2 Histograms of coverage rate for all segments in 
(a) AInus, (b) Casuarina, and (c) Elaeagnus strains. The total 
number of segments was 12,307 for AInus, 8,278 for Casuarina, and 
13,274 for Elaeagnus strains. 



can be excised from chromosomes. LCR gene clusters asso- 
ciated with direct repeat sequences were relatively frequent 
in ACN14a and EANIpec, but nevertheless represented 
only a small fraction of observed clusters (Table 4). In the 
EANIpec genome, several clusters were associated with ISs 
belonging to the same family. To summarize, however, few 
LCR gene clusters were associated with elements previously 
known to be involved in insertion and deletion of DNA 
segments. 

Discussion 

In this study, we used an in silico CGH method based on 
the application of next-generation sequencing technology. 
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Table 2 Number of low-coverage-rate (LCR) genes 



HSG Reference No. of Query No. of Percentage'' 

genes^ LCR genes (%) 

AInus ACN14a 6774 AHl 912 14 

AHmOl 1046 15 

Asil 1241 18 

Mrul 1107 16 

Casuarina Ccl3 4569 CaE03 2 0.04 

CaE04 0 0 

Ceql 184 4 

T7 0 0 

Elaeagnus EANlpec 7250 Ema2 1743 24 

EPOl 2313 32 

EU05 1561 22 

EUrOl 2245 31 



^Total number of protein-coding and RNA genes in tlie reference genome. 
"^Percentage of genes in the reference genome with low-coverage rates 
(LCR genes). 

Resolution obtained using in silico CGH is higher than 
that available from DNA array-based CGH. In addition, in 
silico CGH experiments are less time-consuming, as this 
technique does not require construction of DNA arrays. 
On the other hand, this method is inferior to comparative 
analyses that use assembled genome sequences and hom- 
ology search programs. When a large number of undeter- 
mined genomes need to be compared, however, in silico 
CGH is useful, because complete genome assembly is very 
laborious. 

Recent comparative genomic studies have revealed that 
bacterial genome contents vary greatly, even among closely 
related species and strains [15-19]. As confirmed in our 
study, this is also true for Alnus and Elaeagnus HSG 
Frankia strains. An unexpected and novel finding of our 
study, however, is that this diversity varies depending on 
the HSG. Alnus strains lacked 14-18% of genes present in 
a reference genome from the same HSG, whereas more 
than 20% of genes were absent in Elaeagnus strains, with 
over 30% lacking in two strains (EPOl and EUrOl) 
(Table 2). These divergences are much greater than that 
observed between the actinobacterial species Streptomyces 
coelicolor and Streptomyces lividans (7%) [15]. In the case 
of Escherichia coli, a comparable level (about 25%) of di- 
vergence occurs between pathogenic and non-pathogenic 
strains [16]. Because Frankia and Streptomyces occupy the 
same ecological niche, i.e. soil, environmental factors of- 
fering differing opportunities for horizontal gene exchange 
within the bacterial community cannot be responsible for 
the discrepancy. Inherent properties specific to Frankia, 
such as domino effects (see below), may allow such 
dynamic changes. 

In spite of these remarkable levels of gene loss, PFGE re- 
vealed that actual genome sizes of Alnus and Elaeagnus 



query strains were not as small as expected based on total 
LCR gene length (Figure 6). This result indicates that 
these query strains carry genes that are absent in the 
reference genomes, thus compensating for the reduced 
genome size due to gene loss. Insertion of additional 
DNA segments was indeed observed in query strains 
(Additional file 2) based on genomic PCR. These strains 
have thus both lost and acquired significant numbers of 
genes over the course of evolution; as a consequence, gen- 
ome contents have diverged greatly, even within the same 
HSG. Interestingly, such shufQing of genome content ap- 
pears to have occurred to different extents between the 
two HSGs. More dynamic shufQing has taken place in ge- 
nomes of Elaeagnus strains than in Alnus, as evidenced by 
the greater extent of gene loss (number of LCR genes; 
Table 2) and higher compensated genome size (Figure 6) 
in the Elaeagnus HSG. In Alnus strains, gene acquisition 
and loss seems to have been mostly balanced, because the 
number of LCR genes and genome size are similar among 
strains (Table 2 and Figure 6); this balance was not well 
maintained in Elaeagnus strains. 

Unlike the other two HSGs, very few LCR genes were 
identified among Casuarina strains (Table 2), indicating 
that genome contents were highly similar within the 
HSG. In particular, strains CaE03 and CaE04 were missing 
only two or no LCR genes, respectively (Table 2), revealing 
that these query strains possessed almost all the genes in 
the reference genome (CcI3). Genome sizes of the query 
strains were significantly smaller than that of CcI3, how- 
ever (Figure 6). These results suggest that some compo- 
nents of multigene families in the CcI3 genome were 
missing in CaE03 and CaE04. Normand et al. [4] have 
pointed out that transposase genes are frequently dupli- 
cated in the CcI3 genome, forming large multigene fam- 
ilies. Loss of such transposase genes may consequently be 
responsible for the observed size reductions. 

When complete sequences were obtained for three 
representative Frankia strains, the most surprising find- 
ing was their unusual size divergence. To explain the 
biological significance of this divergence, genome size 
and content have been proposed to influence host range 
and biogeographical adaptation of bacterial strains [4]. 
Casuarina strain CcI3 has the smallest genome, consist- 
ent with the narrowest range of hosts and the limited 
environment of its host plants' habitat (temperate re- 
gions of Australia) [20]. In contrast, Elaeagnus strain 
EANlpec has the largest genome, helping it to achieve 
the broadest host range and to adapt to the wide range 
of soil types and climates under which its host plants 
grow [20]. Our PFGE results support this hypothesis, as 
this correspondence between genome size and HSG held 
true for the 12 strains analyzed (Figure 6). In addition, 
our results suggest that the dynamics of genome content 
shuffling, along with genome size, have contributed to 
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Figure 3 Distribution of coverage rates over genomes. Coverage rates of segments (gene and IGR) are represented by vertical black bars 
arranged in order of their appearance in the genome. Horizontal and vertical axes indicate segment position and coverage rate (%), respectively. 



these symbiotic and biogeographic adaptations. Ge- 
nomes of Elaeagnus strains have likely discarded and ac- 
quired a greater number of genes to manage adaptation 
to a wider range of hosts (spanning five families) and en- 
countered environments. Alnus strains may have also 
done so, but to a lesser extent, because their host range 
(spanning two families) is not as broad as that of Elaeag- 
nus strains. Indeed, LCR genes are associated with regu- 
latory, metabolic, and transport functions (Table 3) 
suggestive of such adaptive roles. In Bradyrhizobium, ac- 
quisition of genomic islands is reported to influence 
symbiotic nitrogen fixation properties [19]. In contrast 
to Elaeagnus and Alnus, Casuarina strains have not ac- 
quired new genes, but have instead discarded them to 



reduce their genome sizes; this suggests an evolutionary 
orientation towards existence as specialist symbionts [4]. 
Casuarina strains infect only a narrow spectrum of 
hosts, spanning two genera, and show reduced sapro- 
phytic activity which is evidenced by the fact that these 
strains have not been isolated from soils outside the na- 
tive habitats of their host plants [21,22]. Such reductive 
genome evolution is often observed in obligate symbiotic 
bacteria [23,24]. 

Most detected LCR gene clusters were not flanked by 
elements known to be associated with genomic islands 
[14] (Table 4). This is similar to the case of E, coli [25]. 
Because current cluster structure is a product of mul- 
tiple DNA rearrangement steps, elements functional in 
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Figure 4 Venn diagram representing overlap of LCR genes 
among Frankia strains belonging to (a) AInus, (b) Casuarina, 
and (c) Elaeagnus HSGs. The total number of nonredundant LCR 
genes is shown above the diagram. Numbers in the diagram are 
percentages of the nonredundant LCR genes associated with the 
indicated overlapping strains. 



the past may no longer be located at cluster termini. We 
therefore cannot determine whether the disparate occur- 
rence of such elements explains differences in genome 
stability. 

On the other hand, den Bakker et al. [26] have pro- 
posed a "domino" effect theory to explain why a particu- 
lar genomic region is subject to active gene acquisition 
and loss. If a genome has acquired a genomic island that 
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Table 3 Dominant functions of the top 10 nonredundant 
LCR genes identified in each studied Frankia 
host-specificity group 



Function 

AInus 



Hypothetical, conserved hypothetical, unknown 
function 

Transcriptional regulation 
Transport-associated 
Protein kinase 



Percentage 

65 

6.0 
3.8 
1.7 



Nonribosomal peptide and polyketide synthetases 1.2 
Transposase 1.2 
Acyl-CoA metabolism 1.0 
Amino acids metabolism 0.9 
Casuarina 

49 



Hypothetical, conserved hypothetical, unknown 
function 

Transposase 

Transcriptional regulation 

Transport-associated 

Restriction and modification system 

CRISPR associated 

Integrase 

Putative ATP/GTP-binding protein 
AMP-dependent synthetase and ligase 
DNA synthesis 
NUDIX hydrolase 
Excisionase 
Elaeagnus 

Hypothetical, conserved hypothetical, unknown 
function 

Transcriptional regulation 

Transposase 

Transport-associated 

Short-chain dehydrogenase/reductase 

Acyl-CoA metabolism 

Protein kinase 

Methyltransferase of unknown function 
Integrase 

Alcohol/Aldehyde dehydrogenase 



9.1 
5.4 
4.3 
2.2 
2.2 
1.6 
1.6 
1.6 
1.1 
1.1 
1.1 

40 

7.9 
6.0 
5.3 
2.8 
1.8 
1.3 
1.2 
1.1 
1.0 



encodes beneficial gene products, the island will be main- 
tained. Most parts of the island will be functionally neutral, 
however; they may easily accept insertion and deletion of 
genes without losing the island s adaptive value, making the 
region a hot spot for gene exchange. We can use this hy- 
pothesis to explain the different dynamics of genome con- 
tent shufQing observed in Frankia; the more genomic 



islands (LCR gene clusters) in a genome, the more chances 
for gene acquisitions and losses. 

Conclusions 

Our results suggest that two genomic properties have af- 
fected diversity in host plant range and biogeography in 
Frankia strains. The first property, genome size, was 
previously proposed by Normand et al. [4] and has been 
validated by our study. The second property is the dy- 
namics of genome content shuffling. In other words, 
Elaeagnus strains have both retained and exchanged a 
large number of accessory genes to adapt to diverse host 
plant species, soil types, and climates. In contrast, Casu- 
arina strains have discarded rather than acquired genes 
to limit hosts and inhabited environments, suggestive of 
an evolutionary preference for specialist symbiosis. Dif- 
ferences in the extent of genome content shuffling can 
be partially explained by domino effects: if a strain car- 
ries more genomic islands, then more neutral regions 
accompany them, thus enhancing genome flexibility to- 
wards gene acquisition and loss. 

Methods 

Bacterial strains 

Frankia strains AHl, AHmOl, CaE04, EPOl, EU05, and 
EUrOl were isolated using the differential filtration 
method [27] from root nodules collected in the field 
(Table 1). Lobes of fresh nodules were sterilized with 1% 
sodium hypochlorite for 5 min, washed with sterilized 
water, and homogenized in a mortar. The homogenates 
were passed through filters with 50- and 20- (im nylon 
mesh screens. Plant residues and Frankia vesicle clusters 
collected through filtration were mixed in 100-ml flasks 
with 40 ml of modified BAP medium [28] lacking am- 
monium chloride (BAP-). The flasks were placed at 29°C 
in darkness. Frankia strain T7 was isolated from root 
nodules of Casuarina cunninghamiana. The fresh nod- 
ules were washed and dissected into individual lobes and 
surface-sterilized as described in [29]. Each lobe was 
checked for sterility in sterile nutrient-rich medium. 
Nodules free from contaminant were dissected and 
transferred to 125-ml flasks containing modified BAP 
medium [30] and incubated at 28°C. Frankia filaments 
were homogenized and diluted 1:100 (v/v) with sterile 
distilled water; 1 ml of the suspension was then trans- 
ferred to melted agar DPM medium [31]. The plate was 
agitated, allowed to solidify, and incubated at 28°C for 
3 weeks. A single colony was picked up, homogenized, 
and cultured in liquid B medium [28]. 

Frankia strains were maintained in BAP (ACN14a), 
BAP- (AHl, AHmOl, CaE03, CaE04, T7, EPOl, EU05, 
and EUrOl), BAP-T [32] (CcI3), or Qmod [33] (Asil, 
Mrul, Ceql, and Ema2) media in tissue culture flasks 
(TPP, Trasadingen, Switzerland) at 28°C. 
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Figure 6 Genome size estimated by PFGE. Sizes estimated witli Dral- and Ps/l-digested genomic DMAs are indicated by closed and open 
circles, respectively. Blue bars represent actual genome sizes of reference strains (indicated by asterisks). Black bars correspond to expected sizes 
of query genomes, calculated by subtracting total length of LCR segments from reference genome size. 



Genomic DNA preparation 

Frankia cells were suspended in TE buffer (10 mM Tris- 
HCl [pH 8.0] and 1 mM EDTA) containing 8 mg ml"^ 
lysozyme, and incubated at 37°C for 1 h. Cells were col- 
lected by centrifugation, and genomic DNA was purified 
using a DNeasy Plant Mini kit (Qiagen, Hilden, Germany) 
according to the manufacturer s instructions. 

Phylogenetic analysis of 16S rDNA 

The full-length 16S rDNA region was amplified by PGR 
using primers GcI3 16S rRNA fl (5'-TTGATGGAGAG 
TTTGATGGTGG-3') and GcI3 16S rRNA rl (5'-AGAA 
AGGAGGTGATGGAGG-3'). Residual primers and nucleo- 
tides were removed by exonuclease I (Takara Bio, 
Ohtsu, Japan) and shrimp alkaline phosphatase (Roche, 
Mannheim, Germany), and PGR products were directly 
sequenced using BigDye terminator v3.1 (Applied Bio- 
systems, Foster Gity, GA, USA). A phylogenetic tree was 
constructed by the neighbor-joining method [34] using 
Genetyx (Genetyx, Tokyo, Japan). GenBank accession 
numbers of generated sequences are as follows: AGN14a, 
NG_008278.1; AHl, AB849940; AHmOl, AB849941; Asil, 
AB847121; Mrul, AB848357; GcI3, NG_007777.1; GaE03, 
AB849939; GaE04, AB849942; Geql, AB848358; T7, 



AB850642; EANlpec, NG_009921.1; Ema2, AB848359; 
EPOl, AB849943; EU05, AB849944; EUrOl, AB849945; and 
Purshia nodule, AF034776. 

Next-generation genomic sequencing 

We sequenced genomes of Frankia strains using a SOLID 4 
next-generation sequencing system (Applied Biosystems). 
Libraries were generated from 1 (ig genomic DNA using a 
SOLID fragment library construction kit. Templated beads 
were prepared with a SOLID ePGR kit v2 and XD beads en- 
richment kit, and then deposited on a glass slide using a 
SOLID XD slide and deposition kit v2. Fifty base pairs at 
the ends of library fragments were sequenced using a 
SOLID ToP fragment BG sequencing kit and a SOLID ToP 
instrument buffer kit. All experiments were performed ac- 
cording to the manufacturer s instructions. 

Data analyses 

To compare the content of Frankia genomes, we used a 
strategy named in silico GGH (Additional file 1), which 
has been used to find missing genes in bacterial genomes 
[35]. The 50-bp reads (query reads) obtained from Frankia 
genomes of unknown sequence (query genomes) were 
mapped to a reference genome whose complete sequence 
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had already been reported [4] — Frankia strains ACN14a 
(GenBank: NC_008278.1), CcI3 (NC_007777.1), and 
EANlpec (NC_009921.1). Mapping was conducted using 
Bioscope software (Applied Biosystems). The term "map" 
means to match an individual 50-bp query read to a re- 
gion with significant sequence similarity on a reference 
genome. Regions of a reference genome onto which few 
or no reads are mapped were deduced to be absent in the 
query genome. To quantitatively evaluate the mapping re- 
sults, we dissected reference genome sequences into two 
types of segments— gene and IGR— and calculated "cover- 
age rate" of each segment. Coverage rate was calculated as 
the percentage of nucleotides in the segment that were 
mapped by one or more reads (Additional file 1). A low 
coverage rate indicated that few query reads were mapped 
to a segment in a reference genome; in such cases, that 
segment was likely absent in the query genome. Coverage 
rates of all the genes in Alnus, Casuarina and Elaeagnus 
strains are shown in Additional file 4. 



Data for GC3 and CAI were obtained from the MaGe 
database (https://www.genoscope.cns.fr/agc/mage/). A list 
of ISs annotated for Frankia genomes was taken from Bic- 
khart et al. [6]. We used the ssearch program (http://fasta. 
bioch.virginia.edu/fasta_www2/fasta_intro.shtml), which 
implements the Smith- Waterman algorithm [36], to find 
repeat sequences at ends of LCR gene clusters. 

PCR 

We conducted PCR using genomic DNAs (10 ng) of 
strains Asil, Ceql, and Ema2 as templates, along with 
the primers listed in Additional file 5, GC buffer I 
(Takara Bio), and EX Taq polymerase (Takara Bio). 

Pulsed-field gel electrophoresis 

Cells of Frankia were harvested from 5-15 ml of cul- 
ture solution and resuspended in 0.3 ml HE buffer 
(10 mM 4-[2-hydroxyethyl]-l-piperazineethanesul- 
fonic acid [pH 8.0] and 1 mM EDTA). The cell 
suspension was mixed with an equal volume of 2% 
low-melting agarose (Agarose-LM plaque; Nacalai Tes- 
que, Kyoto, Japan) in HE buffer and soUdified in plug 
molds (Bio-Rad, Hercules, CA, USA). The agarose 
plugs were incubated with 2 mg mP^ lysozyme in HE 
buffer at 37°C for 2 h, and then with 1 mg ml"^ pro- 
teinase K (Nacalai Tesque) in NDS buffer (0.5 M 
EDTA, 10 mM Tris-HCl [pH 8.0], and 1% SDS) at 50°C 
for 24 h. We removed the proteinase K solution and 
washed the plugs once with 10 ml HE buffer contain- 
ing 0.1 mM phenylmethylsulfonyl fluoride (Nacalai 
Tesque) and three times with HE buffer. The plugs 
were then washed three times with TE buffer and 
equilibrated with Ix buffer supplied with the restric- 
tion enzymes. We digested DNA in 200 \A solution 
containing Ix restriction enzyme buffer, 0.5 mM 
dithiothreitol, 1 mg mP^ bovine serum albumin, and 
30 units of Dral (Roche) or 1 (il of FastDigest Psil 
(Thermo Scientific, Waltham, MA, USA) at 37°C for 
5 h or overnight. Electrophoresis was performed under 
the conditions described in Additional file 3 using the 
CHEF-DR III system (Bio-Rad). Chromosomes of Sac- 
charomyces cerevisiae and Schizosaccharomyces pombe 
were used as size standards (Bio-Rad). Gels were 
stained with SYBR Gold (Life Technologies, Carlsbad, CA, 
USA), and electropherograms were obtained under UV 
irradiation. 



Table 4 LCR gene clusters associated with potential insertion/deletion elements 



Reference genome 


Total^ 


tRNA repeats 


Direct repeats^ 


Same ISs 


ACN14a 


578 


0 


17 


0 


Ccl3 


27 


1 


0 


0 


EANlpec 


1090 


1 


35 


8 



^Number of nonredundant LCR gene clusters found in all query strains. 
'^Direct repeats of tRNA sequences are not included. 
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Availability of supporting data 

All the data supporting the results of this article are in- 
cluded as additional files. 

Additional files 



Additional file 1: Schematic overview of in silico CGH. PDF file ( pdf) 
explaining in silico CGH procedure. 

Additional file 2: Results of genomic PGR. PDF file (.pdf) containing 
electropherograms of PGR products. Genomic DMAs from Asil, Geql, and 
Ema2 were used as templates. 

Additional file 3: Electropherograms from PFGE. PDF file ( pdf) 
containing gel images from PFGE. Numbers on the left side of the image 
are Mbp of size standards (chromosomal DNA of 5. cerevisiae and 5. 
pombe). Numbers with arrowheads on the right side of images are 
fragment sizes estimated from the size standards. Conditions for 
electrophoresis are indicated under the image: (A) 0.5x TBE (45 mM Tris 
base, 45 mM borate, and 1 mM EDTA [pH 8.5]) with 10 mM thiourea, 1% 
agarose, 6 V cm"^ voltage, 60-1 20-s pulse time, 1 20° field angle, and 24-h 
run time at 13°C; (B) Ix TAE (40 mM Tris-acetate, 2 mM EDTA [pH 8.5]) with 
10 mM thiourea, 0.8% agarose, 2 V cm"^ voltage, 1 200-1 800-s pulse time, 
106° field angle, and 48-h run time at 14°C; (C) IxTAE with 10 mM thiourea, 
0.8% agarose, 3 V cm"^ voltage, 120-1200-s pulse time, 106° field angle, and 
24-h run time at 13°C. 

Additional file 4: Coverage rates of genes. Microsoft Excel file 
showing coverage rates of all genes in AInus, Casuarina and Elaeagnus 
strains analyzed in this study. 

Additional file 5: List of primers. Microsoft Excel file containing a list 
of primers used for genomic PCRs of Additional file 2. 



Competing interests 

The authors declare that they have no competing interests. 
Authors' contributions 

KK conceived the study, conducted most of the experiments, and wrote the 
manuscript. TY and SRM isolated and characterized Franl<ia strains, and wrote 
one section of the Methods. HS isolated and maintained Frankia strains and 
analyzed 16S rDNA sequences. TU organized installation and maintenance of 
the SOLID system. KK, V/, and TU participated in the discussion. All authors 
read and approved the final manuscript. 

Author's information 

Hideo Sasakawa as emeritus professor. 

Acknowledgments 

We thank Dr. Louis S. Tisa (University of New Hampshire) for providing 
Fronkia strain Ccl3, Dr. Philippe Normand (CNRS) for providing Frankia strain 
ACN14a, Ms. Yuu Kucho for performing next-generation sequencing, and Dr. 
Petar Pujic (CNRS) for sharing the PFGE protocol. This work was supported 
by the Asahi Glass Foundation. 

Author details 

^Graduate School of Science and Engineering, Kagoshima University, 1-21-35 
Korimoto, Kagoshima 890-0065, Japan. ^Department of Forest Microbiology, 
Forestry and Forest Products Research Institute (FFPRI), 1 Matsunosato, 
Tsukuba, Ibaraki 305-8687, Japan. ^Graduate School of Natural Science and 
Technology, Okayama University, Tsushimanaka, Okayama 700-8530, Japan. 
^Botany Department, Faculty of Science, Suez Canal University, Ismailia 
41522, Egypt. 

Received: 7 October 2013 Accepted: 9 July 2014 
Published: 19 July 2014 

References 

1. Benson DR, Silvester WB: Biology of Frankia strains, actinomycete 
symbionts of actinorhizal plants. Microbiol Rev 1993, 57:293-319. 



2. Huss-Danell K: Actinorhizal symbioses and their N2 fixation. New Phytol 
1997, 136:375-405. 

3. Kucho K, Hay AE, Normand P: The determinants of the actinorhizal 
symbiosis. Microbes Environ 2010, 25:241-252. 

4. Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, Bagnarol E, Bassi CA, 
Berry AM, Bickhart DM, Choisne N, Couloux A, Cournoyer B, Cruveiller S, 
Daubin V, Demange N, Francino MP, Goltsman E, Huang Y, Kopp OR, 
Labarre L, Lapidus A, Lavire C, Marechal J, Martinez M, Mastronunzio JE, 
Mullin BC, Niemann J, Pujic P, Rawnsley T, Rouy Z, et al: Genome 
characteristics of facultatively symbiotic Frankia sp. strains reflect host 
range and host plant biogeography. Genome Res 2007, 17:7-15. 

5. Ghinet MG, Bordeleau E, Beaudin J, Brzezinski R, Roy S, Burrus V: Uncovering 
the prevalence and diversity of integrating conjugative elements in 
actinobacteria. PLoS One 201 1, 6:e27846. 

6. Bickhart DM, Gogarten JP, Lapierre P, Tisa LS, Normand P, Benson DR: 
Insertion sequence content reflects genome plasticity in strains of the 
root nodule actinobacterium Frankia. BMC Genomics 2009, 10:468. 

7. Normand P, Lalonde M: Evaluation of Frankia strains isolated from 
provenances of two AInus species. Can J Microbiol 1 982, 28:1 1 33-1 142. 

8. Nagashima Y, Tani C, Yamamoto M, Sasakawa H: Host range of Frankia 
strains isolated from actinorhizal plants growing in Japan and their 
relatedness based on 16S rDNA. Soli Sci Plant Nutr 2008, 54:685-693. 

9. Zhang Z, Lopez MF, Torrey JG: A comparison of cultural characteristics 
and infectivity of Frankia isolates from root nodules of Casuarina 
species. Plant Soil 1984, 78:79-90. 

10. Yamanaka T, Mansour SR: Modulation of AInus japonica and Casuarina 
equisetifoiia in liquid culture after inoculation with Frankia. Bull FFPRI 
2013, 12:97-103. 

11. Lalonde M, Calvert HE, Pine S: Isolation and use of Frankia strains in 

actinorhizae formation. In Current perspectives in nitrogen fixation. Edited by 
Gibson AH, Newton WE. Canberra: Australian Academy of Sciences; 
1981:296-299. 

12. Fischbach MA, Walsh CT: Assembly-line enzymology for polyketide and 
nonribosomal peptide antibiotics: logic, machinery, and mechanisms. 

Chem Rev 2006, 106:3468-3496. 

13. Horvath P, Barrangou R: CRISPR/Cas, the immune system of bacteria and 
archaea. Science 2010, 327:167-170. 

14. Hacker J, Kaper JB: Pathogenicity islands and the evolution of microbes. 
Annu Rev Microbiol 2000, 54:641 -679. 

15. Jayapal KP, Lian W, Glod F, Sherman DH, Hu WS: Comparative genomic 
hybridizations reveal absence of large Streptomyces coelicoior genomic 
islands in Streptomyces iividans. BMC Genomics 2007, 8:229. 

16. Welch RA, Burland V, Plunkett G III, Redford P, Roesch P, Rasko D, Buckles EL, 
Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz 
DC, Perna NT, Mobley HL, Donnenberg MS, Blattner FR: Extensive mosaic 
structure revealed by the complete genome sequence of uropathogenic 
Escherichia coii. Proc Natl Acad Sci USA 2002, 99:1 7020-1 7024. 

17. Joyce EA, Chan K, Salama NR, Falkow S: Redefining bacterial populations: a 
post-genomic reformation. Nat Rev Genet 2002, 3:462-473. 

18. Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, 
Sarma-Rupavtarm R, Distel DL, Polz MF: Genotypic diversity within a nat- 
ural coastal bacterioplankton population. Science 2005, 307:131 1-1313. 

19. Itakura M, Saeki K, Omori H, Yokoyama T, Kaneko T, Tabata S, Ohwada T, 
Tajima S, Uchiumi T, Honnma K, Fujita K, Iwata H, Saeki Y, Hara Y, Ikeda S, 
Eda S, Mitsui H, Minamisawa K: Genomic comparison of Bradyrhizobium 
japonicum strains with different symbiotic nitrogen-fixing capabilities 
and other Bradyrhizobiaceae members. ISME J 2009, 3:326-339. 

20. Benson DR, Dawson JO: Recent advances in the biogeography and 
genecology of symbiotic Frankia and its host plants. Physiol Plant 2007, 
130:318-330. 

21 . Zimpfer JF, Smyth CA, Dawson JO: The capacity of Jamaican mine spoils, 
agricultural and forest soils to nodulate Myrica cerifera, Leucaena 
ieucocephaia and Casuarina cunninghamiana. Physiol Plant 1997, 
99:664-672. 

22. Simonet P, Navarro E, Rouvier C, Reddell P, Zimpfer J, Dommergues Y, 
Bardin R, Combarro P, Hamelin J, Domenach AM, Gourbiere F, Prin Y, 
Dawson JO, Normand P: Co-evolution between Frankia populations and 
host plants in the family Casuarinaceae and consequent patterns of 
global dispersal. Environ Microbiol 1999, 1:525-533. 

23. Moran NA: Tracing the evolution of gene loss in obligate bacterial 
symbionts. Curr Opin Microbiol 2003, 6:512-518. 



Kucho et at. BMC Genomics 2014, 15:609 
http://www.bionnedcentral.conn/1471 -21 64/1 5/609 



Page 12 of 12 



24. Kuwahara H, Takaki Y, Yoshida T, Shimamura S, Takishita K, Reimer JD, Kato C, 
Maruyama T: Reductive genome evolution in chemoautotrophic 
intracellular symbionts of deep-sea Colyptogena clams. Extremophiles 2008, 
12:365-374. 

25. Touchon M, Hoede C, Tenaillon 0, Barbe V, Baeriswyl S, Bidet P, Bingen E, 
Bonacorsi S, Bouchier C, Bouvet 0: Organised genome dynamics in the 
Escherichia coli species results in highly diverse adaptive paths. 

PLoS Genet 2009, 5:el 000344. 

26. den Bakker HC, Desjardins CA, Griggs AD, Peters JE, Zeng Q, Young SK, 
Kodira CD, Yandava C, Hepburn TA, Haas BJ, Birren BW, Wiedmann M: 
Evolutionary dynamics of the accessory genome of Listeria 
monocytogenes. PLoS One 2013, 8:e6751 1. 

27. Hiyoshi T, Sasakawa H, Yatazawa M: Isolation of Frankia strains from root 
nodules of Myrica rubra. Soil Sci Plant Nutr 1988, 34:107-1 16. 

28. Murry MA, Fontaine MS, Torrey JG: Growth kinetics and nitrogenase 
induction in Frankia sp. HFPArl 3 grown in batch culture. Plant Soil 1984, 
78:61-78. 

29. Mansour SR, Dewedar A, Torrey JG: Isolation, culture, and behavior of 
Frankia strain HFPCgl4 from root nodules of Casuarina giauca. Bot Gaz 
1990, 151:490-496. 

30. Tzeans S, Torrey JG: Spore germination and the life cycle of Frankia 
in vitro. Can J Microbiol 1989, 35:801-806. 

31. Baker D, O'Keefe D: A modified sucrose fractionation procedure for the 
isolation of Frankia from actinorhizal root nodules and soil samples. 
Plant Soil 1984, 78:23-28. 

32. Kucho K, Kakoi K, Yamaura M, Higashi S, Uchiumi T, Abe M: Transient 
transformation of Frankia by fusion marker genes in liquid culture. 
Microbes Environ 2009, 24:231-240. 

33. Lalonde M, Calvert HE: Production of Frankia hyphae and spores as an 
infective inoculant for Ainus species. In Forest Research Laboratory. Edited 
by Gordon JC, Wheeler CT, Perry DA. Corvallis: Oregon State University; 
1979:95-110. 

34. Saitou N, Nei M: The neighbor-joining method: a new method for 
reconstructing phylogenetic trees. Mol Biol Evol 1987, 4:406-425. 

35. Gulig PA, de Crecy-Lagard V, Wright AC, Walts B, Telonis-Scott M, Mclntyre 
LM: SOLID sequencing of four Vibrio vulnificus genomes enables 
comparative genomic analysis and identification of candidate 
clade-specific virulence genes. BMC Genomics 2010, 1 1:512. 

36. Smith TF, Waterman MS: Identification of common molecular 
subsequences. J Mo/ Biol 1981, 147:195-197. 



doi:1 0.1 1 86/1 471 -21 64-1 5-609 

Cite this article as: Kucho et al.\ Different dynamics of genome content 
shuffling among host-specificity groups of the symbiotic actinobacter- 
ium Frankia. BMC Genomics 2014 15:609. 



Submit your next manuscript to BioMed Central 
and take full advantage of: 

• Convenient online submission 

• Thorough peer review 

• No space constraints or color figure charges 

• Immediate publication on acceptance 

• Inclusion in PubMed, CAS, Scopus and Google Scholar 

• Research which is freely available for redistribution 



Submit your manuscript at /^\ r=„«|,-,j rpntrai 

www.biomedcentral.com/submit momea L.enTrai 



